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MAN-MACHINE  SYSTEMS  LABORATORY 
Department  of  Mechanical  Engineering 
Massachusetts  Institute  of  Technology 
Cambridge  MA  021S9 


Three  experiments  are  reported  in  which  subjects  must  maintain 
a  dynamic  plant  at  or  near  some  desired  state  by  selecting  actions 
based  on  a  display  of  noisy  state  information  and  their  own 
knowledge  of  the  plant  dynamic  characteristics.  In  some  cases,  a 
normatively  derived  state  estimate  was  displayed  as  an  alternate 
information  source,  or  decision  aid.  Results  suggest  that  subjects 
are  often  able  to  simplify  their  own  cognitive  tasks  significantly 
while  degrading  their  overall  task  performance  only  slightly. 
Models  of  this  cognitive  behavior  include:  substituting  an  ade¬ 
quate  stimulus-response  like  algorithm  for  one  which  requires 
maintenance  of  an  internal  state  estimate;  retaining  point  esti¬ 
mates  of  state  when  distributions  over  states  are  presented;  parti¬ 
tioning  the  state  space  so  that  the  model’s  state  space  has  fewer 
states  than  than  the  actual  process;  and  ignoring  certain  struc¬ 
tural  details  of  plant  behavior  which  extend  beyond  analogical 
models  assumed  to  be  present  in  the  subject.  The  implications 
for  the  design  of  decision  aids  based  on  state  estimation  are  dis¬ 
cussed. 


1.  INTRODUCTION 


As  controlled  processes  become  larger  and  more  complex,  the  operators  must  integrate  larger 
amounts  of  available  information  into  their  own  knowledge,  or  mental  models.  Computer  aids 
may  be  necessary  to  help  the  operator  deal  with  large  amounts  of  available  information  quickly 
and  effectively.  This  work  will  attempt  to  identify  certain  aspects  of  human  cognitive  behavior  of 
interest  to  the  designers  of  computer-based  decision  aids. 

When  the  controlled  process  is  dynamic,  which  by  definition  forces  the  pace  of  events,  the  operator 
may  be  under  pressure  to  make  decisions.  Examples  of  controlled  dynamic  processes  are  power 
plants,  other  process  control  plants,  ships,  and  airplanes.  Adding  to  the  operator’s  difficulty,  if 
the  system  is  in  some  abnormal  state  or  a  state  unfamiliar  to  the  operator,  his  models  of  the  sys¬ 
tem  may  be  partially  or  totally  inaccurate. 

There  has  been  much  written  on  the  role  of  mental  models  in  problem  solving  and  decision  making 
tasks  [Stevens,  1980;  Young  1983;  Jagacinski,  1978;  Young,  1981;  Collins,  1977].  Early  studies  of 
the  operator  focused  on  continuous  tracking  tasks  and  later  ones  on  the  response  to  changes  in 
plant  dynamics  [Niemela,  1975;  Miller  D.C,  1967).  The  performance  of  subjects  at  prediction 
tasks  has  also  been  measured  and  modeled  [Laios,  1978;  van  Bussel,  1980;  van  Heusdenn,  1980]. 
Some  authors  have  studied  qualitative  reasoning  about  physical  systems  [Forbus,  1981;  DeKleer, 
1975].  Recently,  there  has  been  growing  interest  in  modeling  the  cognitive  aspects  of  the  human 
operator  [Rasmussen,  1976;  Greenstein,  1982;  Rouse,  1977], 

There  are  many  models  of  decision  making  available,  some  being  prescriptive  and  others  being 
descriptive  in  nature.  The  expected  utility  model  for  normative  decision  making  is  probably  the 
least  controversial  prescriptive  model  of  decision  making,  yet  it  is  virtually  impossible  to  imple¬ 
ment  in  real  applications  [Sage,  1981].  It  has  been  used  as  the  basis  for  adaptive  decision  aids 
[F reedy,  1976].  There  seems  still  to  be  a  need  for  studying  the  relationship  between  the  normative 
decision  making  models  and  descriptive  decision  making  models. 


1.1  GOALS  OF  RESEARCH 

The  goal  of  this  work  has  been  to  examine  the  relationship  between  an  operator’s  mental 
model  and  a  computer  based  process  model  for  some  interesting  but  limited  decision  making  con¬ 
text.  The  context  which  has  been  chosen  for  study  is  decision  making  in  the  the  control  of  a  sta¬ 
tionary,  dynamic  process.  By  making  this  choice,  the  discussion  of  mental  models  and  computer 
models  is  restricted  to  a  manageable  level.  Additionally,  there  already  exists  a  large  amount  of 
theoretical  information  about  the  automatic  control  of  stationary  dynamic  processes.  Effectively, 
this  means  that  prescriptive  models  for  decision  making  in  this  context  are  readily  available. 

There  is  now  te  choice  between  looking  at  a  real  process  or  remaining  in  the  laboratory.  Consider 
a  couple  of  points  which  make  the  first  alternative  less  desireable: 


1)  If  there  is  no  normative  model  of  how  the  decisions  should  be  made,  then  there  is  no  way  to 

measure  the  effectiveness  of  a  proposed  decision  aid. 

2)  If  there  is  a  normative  model  of  decision  making  which  is  both  available  and  satisfactory 

(with  respect  to  some  accepted  model  of  the  process),  then  the  sub-normative  human 
should  be  replaced  with  a  machine  which  implements  the  normative  procedure  and  any 
proposed  decision  aids  are  unnecessary. 


3)  If  a  normative  procedure  exists  with  respect  to  a  model  of  the  process,  there  may  be  impor¬ 
tant  differences  between  the  model  of  the  process  and  the  actual  process.  The  presence 
of  these  differences  may  be  the  reason  for  the  human  operator.  However,  these 
differences  also  make  the  procedure  sub-normative  with  respect  to  the  real  process,  so 
any  proposed  decision  aid  may  not  be  evaluated  in  this  situation. 

Thus  we  have  the  simple  choice  of  proposing  decision  aids  for  cases  where  their  effectiveness  may 
never  be  evaluated  or  they  are  unnecessary  in  the  first  place,  or  studying  decision  aids  for 
artificially  constructed  processes  in  the  laboratory  where  the  results  may  be  stated  with  some 
confidence,  but  the  extrapolation  to  real  systems  is  in  question.  We  have  chosen  to  remain  in  the 
laboratory  where  normative  decision  algorithms  can  be  developed. 

We  must  mention  that  there  is  not  much  hope  of  modeling  a  human  perfectly.  If  an  experimenter 
manages  to  predict  every  decision  for  a  subject  over  a  finite  number  of  observations,  then  there  is 
always  the  chance  that  the  subject  was  responding  to  the  particular  sequence  presented  and  a 
slight  variation  in  the  stimulus  would  produce  a  vastly  different  response  from  the  subject. 
Ideally,  the  experimenter  searches  for  a  model  which  is  simple  yet  successfully  predicts  the 
human’s  behavior  for  a  large  percentage  of  the  time.  It  is  always  possible  to  attribute  unpredicted 
human  behavior  to  stochastic  elements  within  the  human,  yet  the  basic  idea  behind  modeling  is  to 
be  able  to  view  the  human  as  comprised  of  deterministic  rather  than  stochastic  elements. 

It  is  always  possible  to  draw  up  a  model  with  a  large  number  of  parameters  such  that  our  sequence 
of  experiments  results  in  precisely  the  observed  set  of  responses.  But  such  a  model  does  not  have 
the  characteristics  of  simplicity  or  elegance,  it  would  not  generalize  to  other  situations,  and  it 
would  be  of  limited  interest  in  any  other  setting.  In  modeling,  we  seek  a  balance  between  robust 
predictability  and  number  of  model  parameters.  Ultimately,  the  test  of  any  model  is  to  make 
predictions  in  a  set  of  circumstances  which  have  not  been  tried.  We  will  do  this  with  the  models 
under  development  presently. 

Regarding  a  descriptive  model  of  the  human,  there  is  of  course  a  great  deal  of  literature  about  cog¬ 
nitive  processes  in  general,  some  of  it  having  direct  relevance  to  our  discussion.  In  the  next  section 
we  will  present  a  general  process  model  involving  probabilistic  state  transitions.  The  various  stu¬ 
dies  of  human  probabilistic  updating  and  prediction  tasks  [Laios,  van  Bussel,  van  Heusdenn]  show 
that  the  human  behaves  differently  from  accepted  normative  descriptions.  In  an  oversimplified 
way,  what  is  meant  by  mental  models  versus  computer  models  in  decision  making  is:  how  it  is 
done  versus  how  it  should  be  done. 


2.  SELECTION  OF  A  REPRESENTATIVE  TASK 


As  a  limited  set  of  problems  for  which  the  study  of  decision  making  is  important,  we  have 
chosen  the  regulation  of  a  stationary  dynamic  process  with  stochastic  state  transitions.  The  most 
basic  model  for  this  process  is  given  by  Figure  1.  The  process  has  a  state  which  is  dependent  on 
actions  taken  by  the  operator  but  is  not  directly  observable.  By  regulation  is  meant  that  there  is 
some  goal  state  at  which  the  operator  seeks  to  keep  the  process  by  means  of  his  control  actions. 
More  precisely,  there  is  some  reward  per  unit  time  defined  solely  over  the  states  of  the  process,  and 
it  is  the  goal  of  the  operator  to  select  his  control  actions  so  as  to  maximize  the  average  rate  at 
which  the  reward  is  collected,  that  average  being  taken  over  all  time. 
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Figure  1.  General  Process  Model. 

The  process  consists  of  a  dynamic  process,  <P,  and  an  output  process,  'i/ .  The 
observable  y(t)  is  dependent  only  on  the  state  x(t)  which  in  turn  depends  on  the 
last  state  x(t—dt)  and  the  last  control  input  u(t—dt). 


The  stationary  dynamic  process  with  stochastic  state  transitions  is  a  very  general  model  and  we 
choose  to  present  it  mathematically  as, 


P(x(t+dt)=Xi)  =  £  <t>i,(u{t))  P(x(t)=Xj),  i=l,n  (2.1) 

P(y(t+dt)=yi)  =  27  P(x{t+dt)= zy),  i=l,m 

/-> 

where  x(f)  is  the  process  state  at  time  t,  u(f)  is  the  input  at  time  t,  y(f)  is  the  observable  at  time 
t,  P(a)  is  the  probability  of  o,  n  is  the  number  of  states  and  m  is  the  number  of  possible  observ¬ 
ables.  The  s  and  Vv/s  are  process  parameters,  which  are  conditional  probabilities  among 
states  and  outputs: 

=  p(  x(t+dt)=Xi  I  x(t)=Xy,  u(0=u*  )  (2-2) 

=  P{  y(t)=yi  |  *(<)=*/  ) 

Note  that  these  parameters  must  conform  to  the  usual  constraints  of  a  probability  distribution 
(i.e.  27^,y=l, ;'=l,n:  and  £^^=1,  j=l,n). 


In  the  literature  (2.1)  is  called  a  partially  observable  Markov  model  or  an  automata  model  and  its 


Figure  2.  A  simple  Markov  model  with  noisy  outputs. 

The  4>ii  are  transition  probabilities  among  states  while  the  V’,->  are  the  probabilities 
of  each  output  conditional  on  each  state.  The  diagram  indicates  state  transition 
behavior  for  only  one  input,  say  u2. 

control  has  been  studied  [Amram,  1982;  Kakalick,  1976].  Figure  2  presents  an  example  of  a  par¬ 
tially  observable  Markov  model  under  the  influence  of  a  specific  control  alternative.  Circles 
represent  possible  process  states  and  boxes  represent  possible  observations,  sometimes  called  out¬ 
put  tokens. 

Conceivably,  the  behavior  of  any  stationary  dynamic  process  could  be  represented  by  the  model 
given  in  (2.1).  By  having  stochastic  transitions,  we  do  not  exclude  deterministic  phenomena  in 
which  all  state  transition  probabilities  are  either  zero  or  one.  Specification  of  the  model  does 
require  enumeration  of  all  states,  outputs,  and  transition  and  output  probabilities  for  the  modeled 
process.  In  practice,  of  course,  this  cannot  be  done  for  processes  with  continuous  states.  Therefore, 
we  must  either  study  processes  that  have  a  small  number  of  states  or  be  satisfied  with  a  relatively 
course  discretization  of  a  continuous  process.  Note  that  if  such  an  enumeration  of  a  continuous 
process  could  be  done,  then  the  distinction  between  linear  and  non-linear  process  dynamics  would 
not  be  important.  A  linear  process  model  such  as  x  =  Ax  +  Bn  may  be  thought  of  as  a  shorthand 
representation  of  a  large  amount  of  state  transition  information. 

The  restriction  that  the  process  model  is  stationary  means  that  the  transition  and  output  proba¬ 
bilities  do  not  change  in  time.  This  does  not  prevent  us  from  using  the  Markov  model  for  situa¬ 
tions  in  which  failures  occur  that  change  the  "characteristic"  of  certain  process  sub-components.  It 
does  mean  that  we  cannot  analyze  the  effects  of  unmodeled  dynamics  on  our  decision-making 
models.  In  spite  of  the  enumeration  problems  described  above,  (2.1)  serves  as  a  useful  parameteri¬ 
zation  of  a  set  of  models  for  which  human  decision  making  can  be  easily  studied.  Perhaps  most 
importantly,  it  is  possible  to  specify  normative  control  behavior  for  this  model  and  even  calculate 
it  when  the  number  of  states  is  small.  This  will  be  discussed  in  the  next  section. 


3.  NORMATIVE  CONTROL  OF  THE  MARKOV  PROCESS 


If  a  utility  function  is  added  to  the  general  process  model  given  by  (1),  then  normative  or 
prescriptive  behavior  may  be  determined.  To  study  the  regulation  problem,  a  scalar  valued 
reward  function,  r(t),  is  defined  over  the  states  of  the  process  as  follows: 


r(0  =  E  r>  p(*(0 =*.) 


(3.1) 


The  constant  r,-  represents  the  reward  issued  when  the  process  is  in  state  x,  for  one  time  period. 
Reward  and  utility  are  used  interchangeably  in  this  discussion.  The  normative  control  action  at 
time  t  for  the  process  specified  in  (3.1)  with  the  reward  function  specified  in  (3.1)  is  determined  as 
follows.  A  belief  vector,  f(t),  is  formed  and  updated  after  each  input-observation  pair.  Each 
component,  /,(<),  represents  the  subjective  probability  that  the  process  is  in  state  x,  at  time  t. 
The  belief  vector  components  are  updated  according  to  two  influences,  that  of  the  last  control 
action  and  that  of  the  present  observation: 


frit)  =  r  *.•,(«(*-*))  n(t-dt)  (3.2) 


which  accounts  for  the  expected  process  state  transitions  associated  with  the  deterministic 
influence  of  the  control  input  for  each  time  step.  Assuming  output  yk  is  observed  at  time  t,  the 
state  is  updated  according  to  a  Bayesian  update 


tm  = 


t*  frit) 
27^*  frrn 


(3.3) 


These  two  parts,  the  prediction  and  update  processes,  may  be  called  the  state  estimate  part  of  the 
normative  control  procedure. 

Each  action  is  then  given  a  ranking  according  to  how  much  reward  can  be  expected  to  be  derived 
from  taking  that  action  under  the  present  belief.  The  action  possessing  the  highest  expected  incre¬ 
mental  reward  is  chosen.  Letting  r,(t)  represent  the  expected  incremental  reward  of  taking  action 
ui  at  time  f.  For  the  case  where  a  stationary  state  transition  matrix  applies  such  as  that  in  (1), 
and  the  reward  function  is  constant  and  linear  over  the  states  as  in  (2),  »,(t)  is  a  linear  function  of 
constant  weighting  coefficients,  That  is, 

*\(0  =  27  «\7  /y(0  (3  4) 


For  the  special  case  given,  the  u>,y  are  determined  off-line  from  evaluating  an  expected  incremental 
utility  algorithm  using  the  plant  model.  If  v,(<)  is  larger  than  all  other  v;(t),  j>i,  then  action  u, 
should  be  chosen  at  time  t. 

The  basic  elements  of  the  normative  control  procedure  are  shown  in  Figure  3.  What  is  important 
to  note  is  that  the  the  belief  state  J(t)  consists  of  a  probability  (or  likelihood)  associated  with  each 
state  of  the  process  and  each  of  these  probabilities  is  used  in  a  precise  way  to  determine  the 
present  action  as  well  as  the  probabilities  after  an  additional  observation  is  made. 


Figure  3.  Elements  of  the  Normative  Decision  Algorithm. 

The  belief  state,  consisting  of  a  distribution  over  the  all  states  of  the  process  is  an 
essential  part  of  the  normative  model  of  decision  making  for  the  experimental  task. 


Note  that  if  a  system  had  a  unique  symbolic  representation  of  each  possible  belief  state,  and  it 
"branched"  from  symbol  to  symbol  correctly  for  all  possible  process  output  tokens,  that  system 
would  be  executing  a  normative  decision  algorithm.  This  hypothetical  example  may  well  be  how 
experts  control  a  process.  Due  to  vast  experience,  almost  every  process  state  which  they  encounter 
they  seem  to  recognize  and  may  have  a  unique  verbal  label  for.  Thus  they  have  a  unique  symbolic 
representation  for  each  state  of  the  process.  Feeling  they  know  what  state  the  process  is  in,  they 
may  also  know  what  outputs  indicate  movement  to  what  other  states.  That  is,  they  have  well  cali¬ 
brated  branching  rules  among  process  states. 

It  is  not  necessary  to  have  a  computer  working  exactly  as  outlined  to  produce  normative  behavior. 
Anything  which  produces  identical  input-output  behavior  the  same  decision  making  behavior  as 
the  normative  algorithm  may  be  considered  a  normative  decision  maker,  regardless  of  the  details 
of  it’s  internal  mechanics.  Indeed  the  normative  decision  making  algorithm  is  itself  a  finite  state 
automaton,  and  numerical  representations  of  belief  sates  within  the  computer  may  be  considered 
to  be  enumerated  symbols  for  each  possible  belief  state. 

3.1  PROPOSED  DECISION  AID  -  A  NORMATIVE  STATE  ESTIMATOR 

In  the  previous  section,  the  components  of  a  normative  algorithm  for  decision  making  in  the 
context  of  controlling  a  stationary  dynamic  process  were  outlined.  These  may  serve  as  a  reference 
for  descriptive  models  or  basis  for  normative  models  of  the  human  decision  maker.  We  assume 
that  the  human  implements  some  algorithm  when  he  selects  actions  based  on  observations,  and 
this  algorithm  consists  of  parts  possibly  including  a  prediction  part,  an  updating  part,  and  a  deci¬ 
sion  rule  part.  These  parts,  though  similar  in  function  to  their  counterparts  in  the  normative 
algorithm,  differ  in  significant  ways  and  it  is  these  differences  which  give  rise  to  the  sub-normative 
behavior  of  the  human  decision  maker. 

A  proposal  for  a  decision  aid  is  to  provide  a  normatively  derived  state  estimate  to  the  operator 
that  he  can  use  to  augment  his  own  sub-normative  state  estimate.  Presumably,  the  decision 
maker’s  internal  state  estimate  when  the  aid  is  present  will  be  "closer"  to  the  normative  state  esti¬ 
mate  than  that  which  he  forms  in  the  unaided  case,  and  the  decisions  which  are  made  will  be 
better  in  an  expected  value  sense.  The  first  experiments  were  designed  to  show  that  a  decision  aid 
based  on  providing  normative  estimates  of  process  state  could  be  of  use  to  human  decision  makers. 
The  next  sections  will  describe  the  experimental  task  that  was  chosen  and  some  results. 


4.  EXPERIMENT  ONE  -  A  THREE  STATE  TRACKING  TASK 


In  order  to  evaluate  the  usefulness  of  a  normative  state  estimate  as  a  decision  aid,  and  to 
study  human  decision  making  in  a  controlled  environment,  a  simple  partially  observable  Markov 
process  was  constructed  with  three  states,  three  outputs,  and  two  control  actions.  The  process 
parameters  were  chosen  so  that  the  overall  task  had  the  characteristics  of  a  single  degree  of  free¬ 
dom,  first  order  tracking  experiment  where  the  plant  was  slightly  non-deterministic  and  the  obser¬ 
vations  were  noisy.  The  exact  process  parameters  which  were  used  are  given  in  Appendix  1. 

The  subject  sat  in  front  of  a  computer  generated  display  which  displayed  one  of  three  possible  out¬ 
put  tokens  at  the  beginning  of  each  timestep.  He  was  instructed  that  these  tokens  gave  an  indica¬ 
tion  of  the  state  of  an  invisible  process,  and  he  was  given  the  parameters  of  the  process  in  the  form 
of  a  state-transition  diagram.  The  condition  where  the  output  tokens  were  the  sole  indication  of 
state  available  to  the  subject  was  termed  the  “raw  observables"  condition  since  it  corresponded  to 
the  information  normally  present  in  an  unaided  process  control  situation. 

In  terms  of  the  model  outlined  in  Section  2,  the  output  token  is  the  y(t)  which  takes  on  some  value 
y,-,  and  the  invisible  state  of  the  process  is  x(t)  which  presumably  has  some  value  x}  at  each  time. 

The  subject  responded  to  displayed  information  by  moving  a  toggle  switch  in  one  direction  or  the 
other  according  to  the  action  he  thought  would  maximize  his  incremental  score  received  during  a 
trial.  One  position  of  the  switch  corresponded  to  movement  of  the  state  in  one  direction,  while 
putting  the  switch  in  the  other  direction  corresponded  to  movement  of  the  state  in  the  other  direc¬ 
tion.  The  time  remaining  to  make  the  decision  as  well  as  the  decision  being  made  were  also 
displayed  on  the  screen.  At  the  end  of  a  trial,  the  total  score  received  by  the  subjects  was 
displayed. 

In  other  conditions,  a  normative  estimate  of  the  state  of  the  process  was  displayed  alongside  the 
raw  observable.  This  was  presented  in  the  form  of  a  bar  graph,  with  the  length  of  each  bar 
corresponding  to  the  estimated  probability  of  the  process  being  in  the  corresponding  state.  The 
normatively  derived  state  information  was  termed  "processed  data."  In  some  trials  only  the  pro¬ 
cessed  data  was  presented,  and  in  others  it  was  presented  together  with  the  raw  observable. 

Subjects  were  graduate  students  in  mechanical  engineering,  all  familiar  with  the  principles  of  con¬ 
trol  systems.  Before  each  trial  within  a  condition,  training  trials  were  given  in  that  condition 
until  the  subject  felt  comfortable.  Typically,  the  experiment  lasted  approximately  two  and  one 
half  hours,  with  roughly  forty  percent  of  the  time  spent  in  training  by  the  subject. 

Subjects  were  given  all  of  the  process  parameters  as  numbers  at  the  beginning  of  the  experiment 
though  no  attempt  was  made  to  provide  them  with  a  rich  understanding  of  the  qualitative  process 
characteristics.  Each  trial  consisted  of  120  decisions,  spaced  at  intervals  ranging  from  0.25 
seconds  to  2.0  seconds  per  decision  but  constant  during  each  trial.  Each  process  output  presented 
to  the  subject  and  the  resulting  decision  made  by  the  subject  (the  toggle  position  at  the  end  of  a 
timestep)  were  recorded  for  later  analysis. 

Four  parameters  of  the  experiment  varied  for  each  trial.  These  were,  process  uncertainty,  output 
uncertainty,  process  rate,  and  presence/absence  of  the  decision  aid.  A  large  process  uncertainty, 
denoted  Ep  for  entropy  of  the  process,  corresponds  to  cases  where  transition  probabilities  were  not 
near  zero  or  one.  Hence  knowledge  of  the  present  state  for  a  process  with  a  large  process  uncer¬ 
tainty  does  not  have  much  implication  on  knowledge  of  future  states.  The  channel  uncertainty  is 
denoted  Ec  and  indirectly  describes  how  much  information  about  the  plant  state  is  given  by  a  sin¬ 
gle  observation. 


Parameter 

Value 

Subject  1 

Subject  2 

Subject  3 

F 

F 

Mean 

StDev 

Mean 

StDev 

Mean 

StDev 

Samp 

Crit 

Observer 

No 

48.5 

13.6 

45.5 

10.4 

40.6 

9.3 

7.78* 

6.85 

Yes 

51.7 

10.8 

47.8 

12.4 

46.6 

12.8 

Plant 

0.187 

53.7 

20.1 

52.2 

14.0 

42.0 

12.3 

6.27* 

4.79 

Entropy 

0.333 

45.7 

9.1 

43.3 

3.3 

39.5 

8.6 

0.418 

46.2 

5.2 

40.3 

5.7 

40.3 

5.7 

Channel 

0.131 

53.9 

7.8 

3.79 

3.95 

Entropy 

0.391 

47.8 

14.7 

45.4 

9.9 

40.4 

9.5 

0.651 

48.0 

10.9 

47.2 

12.7 

41.7 

10.0 

0.911 

44.5 

17.4 

43.6 

7.5 

39.7 

8.3 

Process 

0.133 

43.9 

4.7 

41.6 

6.7 

44.1 

6.1 

2.61 

3.48 

Rate 

0.267 

43.6 

13.4 

38.8 

5.3 

40.1 

6.8 

0.533 

55.0 

17.9 

48.6 

10.6 

41.4 

11.7 

1.067 

51.7 

11.0 

50.9 

11.8 

36.8 

10.1 

2.117 

47.0 

9.8 

TABLE  1.  Summary  of  Results  for  First  Experiment. 

A  (*)  indicates  that  the  null  hypothesis  -  decision  performance  is  independent  of  the 
parameter  value  -  is  rejected  at  the  0.99  level. 


The  experimental  test  matrix  and  the  raw  results  for  each  trial  and  subject  are  given  in  Appendix 
2.  The  summary  results  of  this  first  experiment  are  presented  in  Table  1.  The  score  presented  is 
calculated  by  taking  the  components  of  the  Bayesian  state  estimates  of  process  state  at  each  time 
step  and  summing  them  over  the  trial,  and  then  dividing  by  the  number  of  decisions  per  trial.  By 
this  method  an  estimate  of  the  long-term  average  score  for  the  subject  and  condition  is  calculated. 
The  results  indicate  that  the  normatively  derived  state  information  aided  decision  performance, 
but  it  may  be  surprising  how  slight  this  effect  is.  The  null  hypothesis,  "The  computer  aid  does  not 
improve  decision  making  performance"  was  tested,  and  was  rejected  with  an  F-statistic  of  7.78 
which  is  significant  at  the  0.99  level. 


4.1  THE  USE  OF  SIMPLIFIED,  "THOUGHTLESS"  CONTROL  ALGORITHMS 


The  inputs  selected  and  outputs  observed  up  until  time  t-dt  may  be  fed  into  any  model  of 
the  human  in  order  to  predict  his  action  selected  at  time  t.  Five  different  models  of  the  human 
decision  maker  were  examined  corresponding  roughly  to  the  normative  model  degraded  in  various 
ways.  The  models  which  were  used  were  arrived  at  by  examining  protocols  of  subjects. 


Model  1  consisted  of  a  normative  state  estimator  with  a  normative  decision  rule.  Model  2  con¬ 


sisted  of  a  normative  state  estimator  with  a  "truncated  state"  decision  rule  (discussed  below). 
Model  3  consisted  of  a  "decaying  memory"  state  estimator  (discussed  below)  with  a  normative 
decision  rule.  Model  1  consisted  of  a  decaying  memory  state  estimator  with  a  truncated  state  deci¬ 
sion  rule.  Finally,  Model  5  was  a  direct  input— ^output  mapping  model  or  stimulus— *  response 
model,  in  which  the  human  does  not  hold  any  belief  state  from  decision  to  decision  but  merely 
responds  to  the  current  observation  according  to  some  lookup  table  which  he  has  derived  either 
from  structural  knowledge  about  the  task  or  sufficient  experience. 
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The  normative  state  estimator  and  normative  decision  rule  were  taken  directly  from  the  normative 
decision  making  algorithm  presented  earlier. 

The  truncated  state  decision  rule  was  a  model  in  which  the  state  estimate  of  the  process  held  by 
the  human  is  first  truncated  to  be  one  of  three  statements:  1)  the  process  is  in  state  x(,  2)  the  pro¬ 
cess  is  in  state  x2,  or  3)  the  process  is  in  state  x3.  The  truncation  was  made  according  to  which 
probability  in  the  belief  state  was  the  highest;  If  Jx>  f2  and  fx>  f3  then  statement  (l)  was  used. 
The  action  was  then  selected  according  to  this  belief,  conceptually  speaking  from  a  lookup-table 
where  each  of  the  three  statements  will  result  in  a  specific  action.  The  lookup  table  used  in  the 
model  was:  xx— ►«2  chosen;  x3— ►«,  chosen;  x2—*v(t—dt)  chosen. 

The  decaying  memory  state  estimator  was  a  model  in  which  previous  observations  were  combined 
with  weighting  factors  that  decay  exponentially  as  the  observation  was  made  further  and  further 
in  the  past.  This  corresponded  to  ordinary  first  order  filtering  of  the  observations  to  arrive  at  a 
belief  state.  By  examining  the  predictive  capability  of  models  with  various  decay  rates,  a  good 
model  was  found  to  be  when  the  decay  rate  was  set  to  0.2  seconds. 

Table  I  presents  the  results  of  comparing  each  model  of  the  human  with  actual  decisions  made  for 
each  subject.  From  these  data,  it  is  quite  clear  that  of  the  five  models  examined  and  for  this 
experimental  task,  the  ideal  Bayesian  or  normative  model  is  not  a  good  predictor  of  human  deci¬ 
sion  making  behavior.  This  is  in  agreement  with  much  of  the  literature  which  has  discovered  that 
people  do  not  combine  new  observations  with  probabilistic  statements  in  a  Bayesian  way.  Of  the 
remaining  four  models,  none  is  significantly  more  predictive  than  the  others.  Subjects’  protocols 
indicated  that  they  were  using  something  like  the  stimulus  response  model.  It  should  be 
emphasized  that  although  the  stimulus  response  model  is  a  good  descriptive  model  of  the  subjects’ 
behavior,  it  is  a  bad  prescriptive  model.  More  will  be  said  about  this  point  later. 


Model 

Number 

Model 

Name 

State 

Estimate 

Control 

Rule 

Subject  Number 

1  2  3 

1 

Simple 

- 

- 

81.0 

72.7 

62.4 

2 

Optimal 

Bayesian 

Bayesian 

Optimal 

59.8 

51.9 

56.3 

3 

Truncated 

Bayesian 

Bayesian 

Truncated 

State 

82.5 

76.3 

65.3 

4 

Optimal 

Iconic 

Iconic 

Memory 

Optimal 

80.4 

70.4 

61.3 

5 

Truncated 

Iconic 

Iconic 

Memory 

Truncated 

State 

81.0 

73.5 

63.1 

TABLE  2.  Comparison  of  Predicted  Actions  to  Recorded  Actions. 
Entries  are  percentage  of  decisions  where  decision  predicted  by  model  matched  deci¬ 
sion  made  by  subject. 


5.  EXPERIMENT  TWO  -  TRAPPING  WITH  NEGATIVE  EVIDENCE  OF 
STATE 

In  the  first  experiment  we  observed  that  people  can  adopt  a  simple  rote  procedure  for  select¬ 
ing  actions  in  place  of  a  detailed  normative  algorithm  when  they  feel  their  interests  are  served  by 
such  a  substitution.  In  certain  real  conditions,  this  type  of  response  may  be  encouraged.  For 
example,  procedural  handbooks  in  power  plant  control  rooms  or  military  command  and  control 
systems  both  promote  the  "trained  monkey"  type  of  stimulus- >  response  behavior  from  operators. 
Though  this  type  of  behavior  may  be  desirable  in  some  circumstances,  this  work  seeks  to  describe 
behavior  where  the  cognitive  activities  play  a  larger  role  than  simple  translation  of  inputs  into 
outputs. 

Modifications  to  the  experimental  parameters  were  sought  which  would  make  it  difficult  for  sub¬ 
jects  to  apply  simple  rote  procedures.  Since  the  output  matrix  in  the  first  experiment  tracks  the 
state  of  the  process,  it  was  easy  for  the  subject  to  adopt  a  simple  state  estimation  method  such  as: 
use  the  output  token  as  the  state  of  the  process  (believe  whatever  the  output  says).  It  was  decided 
that  using  an  output  matrix  such  that  the  observables  gave  negative  evidence  of  state  would  force 
the  subject  to  hold  a  state  estimate  which  was  independent  of  the  momentary  "state”  of  the  output 
token. 

With  the  process  parameters  used  in  the  first  experiment,  it  was  difficult  for  the  experimenters  to 
distinguish  good  human  decision  making  behavior  from  a  random  policy.  A  random  policy  is  one 
where  the  action  at  each  time  step  is  chosen  from  a  constant  distribution,  regardless  of  the  state 
information  that  has  passed  before  the  decision  maker.  With  two  actions  available  at  each 
timestep,  the  possible  random  policies  may  be  parameterized  by  a  single  variable  X,  the  probabil¬ 
ity  of  taking  action  «j.  The  theoretical  average  score  under  that  policy  may  then  be  determined 
analytically.  Doing  so  shows  that  the  experiment  chosen  provides  little  difference  between  the 
score  of  a  "good"  random  decision  maker  (X  =  0.5)  and  a  normative  decision  maker.  The  process 
parameters  for  the  second  experiment  were  chosen  to  increase  the  differences  between  normative, 
or  knowledgeable,  behavior  and  random  policies. 

In  Figure  4(a),  the  expected  scores  for  three  different  decision  makers  are  presented.  The  top  line 
is  the  score  that  would  be  achieved  if  full  knowledge  of  the  state  were  available.  The  state  transi¬ 
tion  matrix  determines  where  this  expected  score  will  be  in  comparison  to  the  reward  achieved  in 
the  goal  state.  The  second  line  is  the  expected  score  of  a  normative  decision  maker,  and  the  bot¬ 
tom  line  is  that  for  the  random  policy,  shown  for  various  values  of  the  parameter  X.  In  Figure 
4(b),  we  see  the  corresponding  plot  for  the  second  experiment  showing  a  larger  difference  between 
the  normative  and  chance  policies. 

The  second  experiment  may  be  described  qualitatively  as  follows:  Under  one  action  the  state  of 
the  process  changes  wildly.  Under  the  other  action  the  process  stays  in  the  same  state  for  long 
periods.  The  goal  of  the  operator  is  to  "capture"  the  process  in  the  goal  state  and  keep  the  process 
there  as  often  and  as  long  as  possible.  With  the  parameters  chosen,  any  random  decision  policy 
will  produce  the  same  average  score  of  twenty  percent  of  the  time  spent  in  the  goal  state,  while  the 
normative  control  will  average  approximately  forty-four  percent. 

The  conditions  of  the  experiment  were  the  same  as  those  of  the  first  experiment  with  the  following 
exception.  The  decision  time  was  held  constant  throughout  the  experiment  at  1.0  seconds.  Also, 
the  subjects  worked  with  four  separate  conditions:  1)  State  of  process  directly  observable,  2)  Only- 
output  token  displayed,  3)  Output  token  plus  decision  aid,  and  4)  Decision  aid  only. 
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(a)  Experiment  1 


(b)  Experiment  2 


Figure  4.  Expected  Scores  for  Various  Decision  Makers. 

The  expected  scores  for  various  random  and  intelligent  decision  policies  calculated 
for  the  process  parameters  used  in  the  first  and  second  experiment. 


5.1  THE  REJECTION  OF  SOUND  ADVICE  DUE  TO  INFORMATION  OVERLOAD 


Figure  6  presents  results  for  two  subjects  which  were  somewhat  typical.  The  condition  when 
the  state  is  displayed,  marked  "control,"  established  a  base  level  of  errors  for  the  subject.  The  con¬ 
dition  when  only  raw  observable  information  was  presented  is  marked  "raw  only,"  and  was  the 
hardest  for  all  subjects.  For  some  trials,  only  the  aid  was  presented,  and  all  subjects  achieved 
better  scores  under  this  condition.  For  the  condition  when  both  raw  information  and  processed 
information  were  available,  some  subjects  seemed  to  use  the  aid  while  others  seemed  to  ignore  it, 
judging  by  the  scores  which  were  received. 

It  is  interesting  that  those  subjects  whose  scores  seemed  to  reflect  use  of  the  aid  whenever  available 
also  claimed  they  used  the  aid.  They  believed  they  could  not  do  any  better  than  the  aid  while  the 
others  believed  their  own  methods  of  state  estimation  were  superior  to  that  provided  as  an  aid. 
This  is  in  spite  of  the  fact  that  all  subjects  were  told  the  aid  performed  the  state  estimation  calcu¬ 
lations  correctly.  Those  that  chose  not  to  use  the  aid  also  complained  that  the  additional  informa¬ 
tion  was  confusing  and  too  much  too  handle. 

This  reinforces  an  important  issue  being  raised  by  this  work  -  when  is  more  information  (in  the 
form  of  an  aid  etc.)  too  much  for  the  operator  to  deal  with?  It  has  been  shown  that  the  amount  of 
information  which  a  human  may  remember  in  making  absolute  judgements  with  respect  to  one¬ 
dimensional  stimuli  is  seven  items,  plus  or  minus  two  [Miller,  1967).  So  how  do  we,  as  plant 
engineers  and  scientists,  expect  the  operator  to  deal  with  several  hundred  observable  items  at  a 
time?  In  the  first  experiment,  we  saw  that  the  subjects  adopted  a  simple  input— ►output  rule 
instead  of  an  elaborate  normative  procedure.  This  was  largely  a  matter  of  convenience,  but  even 
though  the  second  experiment  is  small  in  number  of  states  and  observables,  it  is  already  almost 
necessary  for  the  subjects  to  perform  some  simplifications  on  the  incoming  data  to  process  it  at  all 
The  nature  of  these  simplifications  is  what  will  be  explored  in  the  remainder  of  this  paper 
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Results  Summary  for  Subject  Two  Results  Summary  for  Subject  Four 


Figure  5.  Results  Summary  for  Two  Subjects  in  the  Second  Experiment. 

Subject  two  apparently  did  not  use  normatively  derived  state  information  when 
raw  information  was  also  available,  though  his  performance  was  improved  when 
only  processed  information  was  available.  Subject  four  seemed  to  use  processed 
information  whenever  it  was  available. 

5.2  PARTITIONING,  POINT  ESTIMATES,  AND  FOCUS 

From  the  second  experiment,  it  appeared  that  subjects  did  use  an  explicit  state  representa¬ 
tion;  many  referred  specifically  to  one  in  their  own  descriptions  of  "what  they  were  doing."  From 
some  of  these  statements,  three  ways  in  which  the  subjects  appear  to  simplify  their  own  cognitive 
workload  without  incurring  a  severe  penalty  in  performance  have  been  developed.  We  will  call 
these  partitioning,  point  estimates,  and  focus. 

Partitioning  is  a  method  of  representation  simplification  in  which  the  number  of  possible  values 
which  a  state  variable  may  take  on  in  the  representation  is  smaller  than  the  number  of  values  the 
state  variable  may  take  on  in  the  actual  system.  "Number  of  values  in  the  actual  system"  refers  to 
an  approximate  notion  of  the  number  which  system  designers  or  well-informed  operators  would 
consider  the  state  variable  to  be  able  to  take  on.  An  example  of  partitioning  is  when  a  state  vari¬ 
able  which  is  continuous,  having  infinite  states,  is  represented  as  having  only  a  few  possible  values, 
say  "high,"  "medium,"  and  "low." 

As  a  physical  example,  suppose  the  process  under  study  is  an  automobile  engine  and  the  state  vari¬ 
able  of  interest  is  the  level  of  oil  in  the  engine.  A  normative  state  estimate  would  assign  a  proba¬ 
bility  to  each  possible  level  of  oil.  It  may  well  be  that  the  person  only  distinguishes  between  full, 
low,  and  empty  -  a  partitioned  state  variable.  A  belief  state  on  the  partitioned  state  space  would 
assign  probabilities  to  each  of  the  three  represented  values,  P(full),  P(low),  P(empty). 

The  fuzzy  set  approach  to  modeling  or  control  implicitly  incorporates  this  type  of  simplification. 
When  fuzzy  set  membership  functions  are  drawn  for  "tall  men"  vs.  "short  men"  and  actions  are 
selected  on  the  basis  of  membership  values  in  such  sets,  then  all  subtle  variations  in  height  have 
been  ignored  except  for  the  fact  that  an  item  has  a  varying  degree  of  membership  in  one  set  or 
another. 
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Point  estimates  of  state  are  used  to  simplify  the  representation  of  a  state  estimate  from  a  distri¬ 
bution  over  a  set  of  points  to  the  belief  in  a  single  point  as  "the"  state  of  the  process.  This  method 
of  thought  pervades  failure  analysis  where  great  efforts  are  put  into  finding  "the"  cause  of  failure. 
In  a  failure  analysis  situation.  A  normative  decision  maker  would  determine  a  probability  for 
each  of  the  failure  modes  (states)  which  he  has  modeled  and  act  based  on  this  distribution. 

Though  the  point-estimate  type  of  simplification  may  seem  like  a  harmless  or  natural 
simplification  to  make,  it  can  lead  to  very  poor  decision  making  behavior  even  in  static  decision 
environments.  When  a  probability  distribution  is  simplified  to  a  point  estimate,  it  is  equivalent  to 
assigning  the  probability  of  one  of  the  states  equal  to  one  and  the  probability  of  all  other  states 
equal  to  zero.  Thus  the  resulting  decision  algorithm  is  insensitive  to  the  probability  of  almost 
every  state.  When  the  probability  of  a  high-cost  state  is  abnormally  high  though  not  the  highest, 
the  simplified  decision  algorithm  will  not  account  for  this. 

Focus  occurs  when  some  state  variables  of  the  process  are  ignored  or  the  joint  probability  distri¬ 
bution  is  not  taken  in  full  detail.  So-called  common  mode  failures  may  not  be  considered  by  some¬ 
one  who  focuses  on  one  or  a  few  state  variables  at  a  time.  It  is  not  clear  exactly  how  this  type  of 
simplification  will  affect  decision  making  performance  in  general. 


8.  EXPERIMENT  THREE  -  APPROXIMATION  TO  A  CONTINUOUS 
TASK 


A  third  experiment  has  been  constructed  for  the  purposes  of  producing  an  explicit  model  of 
the  human’s  decision  algorithm  in  a  specific  decision  making  environment.  The  approach  in  this 
experiment  has  been  to  1)  construct  a  process  with  a  much  larger  number  of  states  than  the  previ¬ 
ous  experiments  2)  record  decisions  made  by  subjects  in  response  to  process  observables  and  3) 
compare  subjects’  decisions  to  predictions  made  by  various  explicit  models  of  the  human.  The 
explicit  models  of  the  human  were  based  on  the  results  of  the  previous  experiments  and  analysis. 

The  process  was  given  state  transition  behavior  so  that  it  behaved  roughly  as  if  it  had  two  state 
variables,  position  and  velocity,  and  a  force  input.  The  process  had  eighty-one  states  providing 
for  nine  levels  of  position  and  nine  levels  of  velocity.  Thus  the  process  approximated  a  continuous 
process.  The  transitions  were  chosen  so  that  the  state  behaved  as  if  it  were  a  particle  bouncing 
between  two  walls,  with  some  stochastic  nature.  The  output  took  on  one  of  eighty-one  values  at 
each  timestep,  and  the  output  probabilities  were  chosen  so  that  the  euclidean  distance  between  the 
state  and  the  output  in  the  (position,  velocity)  state  space  was  approximately  normally  distri¬ 
buted.  The  reward  was  a  maximum  for  the  position  of  the  particle  in  the  middle,  and  decreased 
linearly  as  the  position  increased  or  decreased  away  from  the  middle  state.  Appendix  3  contains 
the  precise  state  transition,  output  probability,  and  reward  information. 

Previously,  the  human’s  decision  algorithm  was  broken  down  into  two  separate  processes:  a  state 
estimation  process  and  a  decision  rule  process.  There  are  also  the  reward  function  and  belief  state 
which  are  considered  to  be  inputs  or  outputs  of  processes.  The  state  estimate  process,  the  decision 
rule  process,  and  the  reward  function  are  assumed  here  to  be  stationary  over  the  course  of  the 
experiment,  while  the  belief  state  is  assumed  to  be  transient  and  changes  from  decision  to  decision 
as  the  human  takes  observations  and  his  own  actions  into  account.  All  four  components  are 
assumed  to  exist  in  the  human  decision  making  algorithm.  We  will  focus  on  forming  successful 
models  of  the  three  stationary  components:  the  state  estimator,  the  decision  rule,  and  the  reward 
function. 

As  models  of  the  state  estimation  process,  we  have  already  presented  several  ways  in  which  the 
human  may  simplify  a  state  representation  before  using  it  in  a  decision  rule  process.  Hence  vari¬ 
ous  combinations  of  these  simplifications  will  result  in  several  models  of  the  state  estimation  pro¬ 
cess.  Six  models  were  used  in  this  experiment.  The  models  were  not  meant  to  be  exhaustive  or 
complete,  but  rather  representative  of  the  wide  range  of  models  which  can  be  produced  by  such  a 
reticulation. 

The  normative  belief  state  for  this  process  would  be  a  set  of  eighty-one  probabilities,  one  for  each 
state.  Figure  8  shows  six  displays  which  represent  a  normative  belief  state  which  has  been 
simplified  in  various  ways  (those  used  in  the  experiment). 

In  addition  to  using  various  models  of  the  state  estimation  process,  two  different  models  of  the 
decision  rule  and  three  different  models  of  the  reward  function  were  used.  As  we  have  said,  the 
human  should  be  applying  the  expected  average  incremental  reward  algorithm  by  looking  at  the 
consequences  of  each  possible  action  infinitely  far  into  the  future.  This  is  one  model  of  the  human 
decision  rule.  The  other  model  is  when  the  human  considers  the  results  of  his  action  only  for  the 
next  time  step.  These  two  models  may  be  called  long  range  versus  short  sighted  decision  making. 
The  three  reward  function  models  used  correspond  to  1)  that  which  is  given,  2)  one  in  which  the 
task  is  essentially  to  avoid  states  far  from  the  target  state,  and  3)  one  in  which  the  task  becomes 
to  get  in  the  target  state  as  often  as  possible,  These  are  diagrammed  in  Figure  7 

The  results  of  applying  this  method  of  algorithm  identification  are  summarized  for  one  subject  in 
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(a)  No  Simplification 


_  n«-L  HT1_ 

-  v  ♦  -  V  ♦  ”  V  ♦ 

(b)  Partitioning 


Z  ♦ 

(e)  Partitioning  +  Focus 


(f)  Partitioning  +  Point  Estimates 


Figure  ®,  Various  Displays  Used  in  Third  Experiment 
Belief  state  representations  may  be  simplified  in  various  ways.  These  six  ways  were 
used  as  possible  models  of  the  human’s  state  estimation  process  in  dynamic  decision 
making. 

Table  3.  There  we  can  see  that  none  of  the  models  stands  out  as  being  highly  predictive  of  the 
decisions  made  by  the  subject,  though  we  could  interpret  this  as  meaning  that  the  human  uses 
such  a  simplified  estimate  that  we  are  unable  to  observe  any  differences.  Note  however  that  when 
the  results  are  grouped  by  decision  rule,  the  results  are  much  more  predictive  for  the  short-sighted 
reward  function. 


In  accordance  with  these  findings,  an  ad  hoc  model  of  one  particular  human  subject  was  developed 
which  combines  the  simplest  state  estimation  model  with  a  very  simple  lookup  table  (in  our  case 


Reward 


Reward 


Reward 


a 

Position,  x 
(b)  Important  Target  State 


Figure  7.  Models  of  the  Subjects  Internal  Reward  Function. 

These  correspond  to  (a)  given  reward  function,  (b)  avoiding  states  far  from  target 
state,  and  (c)  hitting  target  as  much  as  possible. 


Decision  Rule  #1  Decision  Rule  #2 


Model 

Util 

Util 

Util 

Util 

Util 

Util 

Number 

Func 

Func 

Func 

Func 

Func 

Func 

#1 

#2 

#3 

#1 

#2 

#3 

0 

42.5 

43.6 

41.7 

40.6 

41.4 

41.1 

1 

51.1 

51.9 

50.3 

49.4 

57.2 

55.6 

2 

42.2 

43.6 

39.7 

40.6 

40.8 

40.8 

3 

50.8 

51.9 

50.0 

48.6 

57.2 

55.6 

4 

38.0 

37.8 

39.2 

37.2 

21.9 

41.1 

5 

51  7 

52.5 

50.8 

49.1 

57.2 

55.8 

TABLE  3.  Predictability  of  Various  Models  in  Third  Experiment. 
Scores  are  in  percentages  of  decisions  which  were  correctly  predicted  by  indicated 
models. 


derived  from  experimental  data).  The  use  of  this  model  to  predict  the  human’s  actions  is  summar¬ 
ized  in  Table  4.  There  we  see  that  an  overall  predictability  of  69%  is  achieved  compared  to  a 
maximum  predictability  of  57%  using  the  other  decision  rules. 

One  explanation  of  these  results  could  be  the  following.  The  human  subject  took  the  verbal 
instructions  and  immediately  formed  a  decision  rule  such  as  that  given  by  the  histogram.  In  fact, 
under  condition  0  this  subject  said  he  would  look  at  what  quadrant  the  peak  was  in,  and  act  based 
on  this  result.  His  histogram  data  tend  to  support  this.  Figure  8  is  a  histogram  of  a  point  esti¬ 
mate  of  state,  indexed  on  each  of  the  three  actions.  Figure  9  presentwhat  a  the  histogram  for  the 
normative  algorithm  would  look  like. 

What  is  interesting  is  that  the  subject’s  histogram  is  exactly  that  which  would  be  produced  by  a 
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Experimental 

Predictability 

Condition 

(percent) 

0 

63.3 

1 

75.0 

2 

76.7 

3 

81.7 

4 

73.3 

5 

73.3 

Overall: 

69.4 

TABLE  4.  Predictive  Capabilities  of  Ad  Hoc  Model. 

Scores  tabulated  by  experimental  condition. 

"good"  decision  maker  for  a  similar  task  without  bouncing  walls,  without  the  stochastic  parts  of 
the  process,  and  for  "average"  process  parameters  given  the  rest  of  the  process  structure.  It  seems 
that  the  subject  may  be  relating  the  task  description  to  something  with  which  he  is  familiar,  then 
forming  a  decision  rule  lookup  table  from  this. 

Since  the  true  test  of  any  model  is  its  ability  to  predict  in  unknown  circumstances,  the  following 
experiment  is  suggested.  If  the  person  is  merely  relating  the  task  description  to  some  internal, 
archetypical  task  and  choosing  a  decision  rule  based  on  this,  then  his  histogram  should  be  constant 
regardless  of  the  exact  task  description  given  (at  least  until  experience  shows  him  that  his  model  is 
poor).  Therefore,  give  subjects  widely  varying  task  descriptions  and  apply  the  identification  tech¬ 
niques  developed  to  produce  histograms.  This  model  of  the  human  would  predict  that  despite 
variations  in  the  process  description,  the  subjects  would  all  produce  roughly  the  same  action  histo¬ 
gram.  This  should  be  the  approach  in  further  experiments. 


Estimated 

Velocity 


Estimated 

Velocity 


Estimated 

Velocity 


Estimated  position 
Action  «!  taken 


Estimated  position 
Action  «2  taken 


Estimated  position 
Action  «3  taken 


Figure  8.  Summary  of  Actions  Taken  by  Subject. 

The  values  of  hypothesized  internal  belief  state  variables  are  indexed  on  the  action 
which  was  taken. 
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Figure  fl.  Approximate  Histogram  of  Normative  Decision  Maker. 

If  the  decision  maker  takes  into  account  the  bouncy  walls  of  the  process,  the 
corners  of  the  action  histogram  differ  considerably  from  those  produced  by  subjects. 


7.  CONCLUSIONS 


A  variety  of  modeling  methods  have  been  employed  in  attempts  to  converge  upon  a  descrip¬ 
tion  of  a  human’s  mental  model.  To  this  end,  a  general  class  of  experimental  tasks  was  chosen 
(which  can  apply  to  a  wide  variety  of  real  tasks)  which  consists  of  regulating  a  dynamic  process 
having  both  deterministic  and  stochastic  state  transitions.  The  partially  observable  Markov 
model  is  an  appealing  model  to  use  in  such  experimental  settings  because  it  is  general  yet  tract¬ 
able.  Despite  the  ease  with  which  a  normative  decision  algorithm  may  be  specified  for  this  type  of 
process,  the  implementation  of  a  normative  algorithm  becomes  extremely  difficult  or  impossible  as 
the  process  considered  has  more  states. 

Three  experiments  were  reported  in  which  the  human  apparently  simplified  his  own  mental  task 
greatly  without  sacrificing  much  in  terms  of  overall  task  performance.  In  the  first  experiment,  the 
subjects  appeared  to  simplify  state  estimation  out  of  the  experiment  and  perform  a  simple  (and 
fairly  successful)  input— »output  translation.  In  the  second  experiment,  evidence  showed  additional 
ways  that  the  subject  could  simplify  his  internal  state  estimate  to  reduce  his  mental  workload,  yet 
his  decision  performance  was  not  substantially  diminished.  In  the  third  experiment,  the  human 
appeared  to  use  a  simple  lookup-table  on  a  simplified  belief  state  instead  of  the  complex  decision- 
rule-applied-to-  belief-state-and-reward-function  which  is  suggested  by  normative  models,  and  his 
performance  again  was  relatively  good.  There  is  also  evidence  that  this  mapping  is  formed  at  the 
outset  from  structural  descriptions  of  the  experiment  rather  than  during  experimental  trials  where 
a  process  model  is  gradually  changed  based  on  empirical  evidence. 
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8.  APPENDIX  1  -  PROCESS  PARAMETERS  FOR  FIRST  EXPERIMENTAL 
TASK 


Pi  0  0 

*(«l)=  1  — Pi  P2  0 

0  1-P2  1 

1  1 — p2  0 

<P(u2)=  0  p2  1— Pi 

o  o  Pi 

9  (1— ?)/2  (l-?)/2 

*  =  (1-9)/  2  q  (I- ?)/2 

.  (1— ?)/2  (1— fl)/2  q 

R  =  [  0.0  1.0  0.0 

Values  for  "p”  used  in  experiment: 

V  Er 

0.0574  0.187 

0.1599  0.333 

0.3707  0.418 

Values  for  " q "  used  in  experiment: 

9  E, 

0.9741  0.131 

0.8938  0.391 

0.7764  0.651 

0.5896  0.911 

Expected  scores  for  various  decision  makers: 

DM  with  knowledge  of  state:  37.8% 

Normative  decision  procedure:  35.6  % 

Random  policy  with  X  =  0.0  0.0  % 

0.1  6.4  % 

0.2  12.8  % 

0.3  18.5  % 

0.4  22.4  % 

0.5  23.8  % 

0.6  22.4  % 

0.7  18.5  % 

0.8  12.8  % 

0.9  6.4  % 

1.0  0.0  % 
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9.  APPENDIX  2  -  RAW  SCORES  BY  SUBJECT  FOR  EXPERIMENT  ONE 


Scores  indicate  percentage  of  time  spent  in  target  state. 
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42.2 

48.2 

48.7 
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29.0 
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46.2 
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52.5 

0.533 

43.0 
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64.0 
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45.2 

41.5 
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56.5 

42.2 

48.5 
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50.0 

41.5 
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1.067 

56.0 

44.0 

49.0 
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50.7 

45.0 

40.7 

0.267 

43.5 

44.5 

44.0 
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39.0 

46.0 

27.3 

1.067 

26.0 

41.0 

30.0 

0.911 
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42.0 

45.0 

42.2 
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37.5 

44.5 
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44.7 
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20.0 
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0.133 
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52.0 
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54.0 
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42.7 

41.5 

0.267 

45.5 

36.0 

45.0 

0.533 

44.0 

46.5 

45.3 

1.067 

48.0 

47.3 

46.0 
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10.  APPENDIX  3  -  PROCESS  PARAMETERS  FOR  SECOND  EXPERIMEN¬ 
TAL  TASK 


0.96  0.04  0.02 
*(«,)  =  0.02  0.92  0.02 
.0.02  0.04  0.96. 

0.50  0.50  0.25 
<P(u2)  =  0.25  0.00  0.25 
.0.25  0.50  0.50. 

0.10  0.45  0.45 
P  =  0.45  0.10  0.45 
.0.45  0.45  0.10. 

R  .  [0.01.00.0] 

Expected  scores  for  various  decision  makers: 

DM  with  knowledge  of  state:  75.7  % 

Normative  decision  procedure:  44.8  % 
Any  random  policy:  20.0  % 
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